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Some of the basic concepts of information theory are crit- 
ically reviewed in the light of a generalized formulation 
of the theory of Markoff's chains, in which the initial 
and final states are sequences of symbols of different 
lengths, and occurrence of symbols is governed by inter- 
symbol correlation of finite range. In particular, the 
conditions of ergodioity and the structure of "ergodic 
subsets" of sequences of arbitrary length are carefully 
discussed. A mathematical method is developed to determ- 
ine the "range" and "strength" of intersymbol correlation. 
A brief summary of the oontent is given at the end of Sec- 
tion 1. 



1 



#1. Introduction 



The aim of this paper is to clarify some of the basic, but often 
carelessly used concepts of information theory, viz., the concepts of 
ergodicity, intersymbol correlation and redundancy. There are two ap- 
proaches to this problem-complex pertaining to probability. One is 
an empirical point of view, and probability here is understood in its 
statistical aspect. The other is an a priori point of view which 
deals with probability mainly in its predictive aspect. In the first 
standpoint, the entire population of messages in a language is sup- 
posed to be given, and the various probabilities are calculated by 
the actual frequencies of individual symbols or those of sequences 
of symbols. According to this method, a unique value of the proba- 
bility of appearance of a given symbol or a given sequence can be 
statistically determined. In the second point of view, an ensemble 
of messages is supposed to be engendered by the given correlation 
probabilities starting from a given initial symbol or a given initial 
sequence of symbols. In this case, the existence of a unique, non- 
vanishing value of the probability of appearance of a given symbol or 
a given sequence is not guaranteed, for it may vanish with increasing 
length of messages, and it may depend on the initial condition. Thus, 
the problem of ergodicity acquires foremost importance in this ap- 
proach. 

Our section 2 dealing with the problem of ergodicity is therefore 
developed in the framework of the second point of view. Once the 
nature of the ergodicity condition is clarified and this condition is 
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assumed to fulfilled , then a smooth passage from the second point 

of view to the first becomes easy. Thus, our section 3 on redundancy 
can be interpreted in either point of view. 

It is not implied by the foregoing paragraphs that the problem 
of ergodicity is irrelevant to the first standpoint or cannot be for- 
mulated in the framework of this standpoint. The situation is that 
the nucleus of the problem under consideration can be exhibited more 
directly and naturally in the second point of view. 

The usual theory of Markoff's chains, which is based on transi- 
tion probabilities from one state to another, is extended in this pa- 
per to the case where the probability Q(ai , . a v .| | a v ) of sym- 
bol a v appearing in a message is dependent on the ( V - 1) immedi- 
ately preceding symbols, V being the range of intersymbol correla- 
tion. A population of infinitely long messages is considered to be 
engendered solely by this intersymbol correlation probability: 

Q(aj , . a y _, jay ) from a given ( V - l) -symbol initial sequence. 
The problem of ergodicity then pertains to existence of unique (i.e., 
independent of initial sequence), non-vanishing value of P(a u . a^.), 
which should give the probability that a ^ - symbol sequence arbi- 
trarily taken from the population is (a 1} a^.), jJ-. being not 

necessarily equal to , ■‘■his generalized problem of ergodicity is 
discussed in our Section 2. 

It is shown not only that finiteness of correlation range does 
not warrant ergodicity, as is often erroneously assumed in existing 
literature, but also that if fX<V the quantity P can have more 



3 



than one finite value depending on the initial sequence, a situation 
which does not exist in the ordinary Markoff chains. 

Under the conditions that guarantee existence of unique (whether 
or not non- vanishing) value of P, a convenient quantity, called cor- 
relation index W^u. , defined by Eq. (31) , is introduced, characteriz= 
ing both "range" and "strength" of correlation. First, it represents 
the "range", in the sense that the actual correlation range is the 
maximum value of for which 0. This criterion is both of 

theoretical and practical interest. Theoretically, this determines 
the applicability of the generalized theory of Markoff's chains, and 
practically, this can be used to measure the existing correlation 
range in a given population of messages. 

Second, this quantity W^. represents the "strength" of corre- 
lation, in the sense that VI ^ quantitatively measures the decrease 
of information due to the existence of fA. - symbol correlation as com- 
pared with the ( jX - l) - symbol correlation. Finally the so-called 
redundancy is expressed in the form of a compact series in ascending 
range-numbers of the correlation indices, Eq. (42). 
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#2. Ergodicity 

We assume the alphabet under consideration to consist of N sym- 
bols: S 1 j S 2 , j ...Sn. We shall constantly use a mathematical 

symbol: 



Q. (°-i / Qiy 




i»+i ) 



Q-n-i > Q- n ) } 



( 1 ) 



where each one of a, , a 2 , . . , a^ can be any one of the N symbols. 
Definition I . The quantity denoted by (1) represents the proba- 
bility that the last (n - m) symbols of a sequence of n symbols 
are (a m + t , . . , a ^ ) when it is known that the first m sym- 
bols of the sequence are (ai , . ., am. ). 

By the very nature of probability, we have 

( 2 ) 

iC ' ' 2~/ Q (a,,.., I > Q'*-') — . 

‘-Wi Q*v 

If there is no correlation between symbols, the probability of 
any place in a sequence being occupied by symbol is independent 
of the preceding symbols. As result, the only quantity which deter- 
mines a probability of the type (l) is Q(Sj_) which represents the 
probability of symbol appearing at any one place. In this case, 

we have: 

Q (a,, .. f I Q-rn*| , • * / 

~ Q ( (k 1 Q ( (In. ) , 



If the correlation extends, for instance, over three consecutive 
symbols, and not more than three, then the probability of a place in 
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a sequence being occupied by symbol will depend on the two sym- 

bols directly preceding it, but not on the symbols beyond these two. 

This means that the quantities Q(S^, Sj | S^) determine the general 
probability (1): Q (a,,.. , | a m *, , •• ,Cu) 

= Q ( Qm-| > <A*n \ + l ) Q (&>v\y Qrn+I I Qyv,f 2 ) Q I C '’>) , 

In general, we have the following theorem: 

Theorem I . If the intersymbol correlation does not extend 
over more than JX consecutive symbols in a sequence, we can 
factorize (l) as follows : 

Q. (&, i ' * ) | Q.wi+1 y ‘ 1 / ^‘’'0 

^ Q(Ckm- r+ Zy-->am|am + .)Q(Qm r+? y'-^m,|Q m+ x) — Qfan-/* ♦ • ; '*■» a «*\ (») 

This theorem can be used to define the "range-number" of inter- 
symbol correlation: this number V is the minimum allowable yu. in 

the decomposition (3). 

Assuming the correlation to be of range V , we consider all 
the possible sequences whose first ( V - 1) symbols are given to be, 
say, (a t , a 2 , . ., a v »_ 1 ). Among these sequences starting with 
(a, , a^ , . . a v _, ), we inquire the probability of those sequences 
whose first y symbols are (a l; b l( b 2) . .,b v _| ). ^his probability 
is obviously given by 

Q ( Q ly ' Ql/-I I b y_| ) 

if , tX v -i ) — C bi , . • , bv-2 ) j 
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and otherwise 

R (a,,a t , , Civ-I | b t/ - - y b ) = O . 

In other words, the probability in question can be written in a matrix 
form: 

CO'i/Oi , , Qy_, | R | b,, b 2 / • • bv-.) 



~ Q ( ft • ; Q ^-1 1 b k-i ) ^Cos > b 1 )(5(Qj^'bj.) **• , by. 2 ) ^ 

with 



£(.$;, Sj )=0 


if 


L *j 


S(S w Sj) = 1 


if 





(4) 



Using this matrix-expression, the probability, in the aabbve popu- 
lation of sequences, of a particular sequence (b , ,b x . • bv-i ) appear- 
ing in such a position that the place distance between a t and b ± 
is m symbols can be given by 

Ca, , •' , Qy_i | b,, • • , bv-i ) 






(5) 



,m 



where R simply means the m-th power of R in the sense of matrix- 
multiplication . 

With the help of the quantity (5), we can further calculate the 
probability of a given sequence of any length ( f-K - 1), say (b 1; » . ,b^-i), 
appearing at any position after the initial (a 1 a H ). If yU7i^ 
this probability will be 

T w (Q l ,..,a^ 1 |b lj „ ; b /t . 1 ) 

- T ( ) (c< 1 j .. t Qv-i| b 1y ,. , by_, ) Q ( b, , - v by., | by) •• • QCb^.#/ (g) 
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where m stands for the symbol distance between a t and b ± » 



If JU<V, we have 




where m bears the same meaning* 



Now, the average probability of sequence (b 1; . . , byn-| ) with 
the ’’place-distance" not larger than m will be 




We now proceed to define what we mean by ergodicity in this 
paper. We consider all the possible, infinitely long sequences which 
start with a given initial sequence (a t , . ., a ) and ask the 



any position. This probability evidently has the mathematical ex- 
pression: 



The word average here implies a two-fold averaging, viz., first, 
averaging over all the possible sequences with a fixed position 
where the sequence (b 1 , . ., b^._, ) should appear, and second, 
averaging over all the possible positions of this sequence. The 
first averaging is mathematically represented by the matrix multi- 
plication in (5), and the second averaging by the summation in (8). 

Definition II. If U *(a, , . . , a lb,. . . . b*-i ) con- 

Yr\ °0 ' ' 

verges to a unique, non- vanishing limit independent of 



average probability of the sequence (b 



i > • •> 



b^_, ) appearing in 




( 9 ) 
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( a | , . ., a v _| ), where (a t , . . , a v _, ) can be taken arbi- 



trarily from a certain family of (V - l) - symbol sequences and 
(b t , . . , b^.-| ) can be taken arbitrarily from a certain family 
of (^A- l) - symbol sequences, then we speak of ergodicity with 
regard to these families. 

We shall presently see that the quantity (9) with a fixed ini- 
tial sequence (aj^ , . ., a y _| ) and a fixed final sequence (b^ , . ,b ) 
indeed converges to a limit, say: 

U" ^ > ^'l 1 bi ; . • , byw-i ) ^ (10) 

but this limit is not necessarily larger than z ero, nor is it in general 
necessarily independent of the initial sequence. In order to under- 
stand clearly the situation, let us invoke some well-known mathemati- 
cal theorems regarding the Markoff chains. 

The ordinary Markoff chain formally pertains to a two-symbol 
correlation probability C^IRlp) f ( oi ( ^= 1, 2, . . . , M): 



(«*IRlp)2 1 , L CoUR\(i) - 1 



( 11 ) 



In accordance with the usual rule of matrix multiplication, we fur= 
ther introduce 



) ' 22 "L («t|RlH)(K\Rl X) •* C/a| R 1/0 

\ X > ^ 



Vru 



(W) 



Then, we have the following theorems: 

Theorem II . The quantity defined by 

m o 

V <m> C*l|i) = £ ^C«IR e l|j) 



(13) 



1. See for instance'W.cFeLler, (- Introduction' tb’Prohability theory and 
its Applications ! (John ' Wiley 3 New York, 1950) p. 307 ff. 
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for any given pair ( ex , ) converges to a limit as m ^ ; 

(14) 

Theorem III . The entire set G of symbols ( o( - 1,2, . . , M) 
can be divided into a "vanishing" subset V and a certain num= 
ber of "closed" subsets C,.(i = 1,2, . . ) in such a way that 

| ) = 0 for oC belonging to G, and for p belonging 

to V, 

U (°<l^) >0 for and |S belonging to the sa me C^, 

U 1 ) = 0 for of and p, belonging to differen t C ' s c 

Theorem IV . IT ( a | p ) is independent of oC , if o( and 
belong to the same C. 

Coming back to our original topic, if the correlation-range is 
two, and if f^—V, these theorems can be directly applied to our 
problem involved in Def inition U . If the correlation-range is > 2, 
we only need to consider a sequence of (V - l) symbols collectively 
as a symbol o(. . The I^s defined in (4) indeed satisfy (11). The 

cases; ^ ^ V can be handled with the help of (6) and (7). 

From Theorem II follows quite generally; 

Theorem V . The limit (10) exists . 

We shall now discuss first the case in the light of 

Theorems II, III and IV,, According to Theorem III, the entire set 
of ( V - 1) - symbol sequences is subdivided into a vanishing subset 
V and a certain number of closed subsets C^ 0 If the final sequence 
of (10) belongs to V, then U ^ is zero independently of the initial 
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sequence. For a given final sequence belonging to one of the closed 
subsets, U Co4) will be zero if the initial sequence belongs to another 
closed subset, and will have a constant non-vanishing value insofar 
as the initial sequence belongs to the same closed subset as the final 
sequence. Thus; 

Theorem VI . When /a = V_ , ergodicity in the sense of Dei. II ho lds 
if ■ a!nd ally if the initial family and the final family are the 
same closed subset. 

In the cases where jJi > V , we construct an "extended" closed 
subset of ( j-K. - l) symbols by taking those ( - l) - symbol se- 

quences (bj , . . , by«.~| ) whose first ( V - 1) symbols coincide with 
one of the members of the (V - 1) - symbol closed subset and 
which satisfy the condition; 

Q Cbi , I by ) Q( ba., . • , b y | bj, +1 ) • • • Q ( byn-y, •• , b^-*| b^. ( ) ^ 0 (15) 

The extended vanishing subset will be composed of all those ( - 1 ) 

- symbol sequences whose first ( y - 1) symbols coincide with one of 
the members of the ( y - 1) - symbol vanishing subset, or whose first 
( V - 1) symbols coincide with one of the members of some closed sub- 
set but whose last ( — V ) symbols violate the condition ( 15 ). 

The entire set of possible ( jA - 1 ) - symbol sequences are thus 
covered by the D's and V, and there is no possible overlapping. If 
the ( - 1) - symbol final sequence of (10) is a member of this ex- 
tended vanishing subset, U will certainly vanish whatever the ini- 
tial sequence may be. If the final sequence belongs to an extended 
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will vanish for an initial sequence be- 



C 06) 

closed subset D^_, then U 
longing to a Cj different from the one, which corresponds to 

and will have a constant non- vanishing value for any initial se- 
quence belonging to Cj_. 

Theorem VII . When , ergodicity holds if and only if the initial 

family is one of the closed subset C. and the final family is 
the extended closed subset corresponding to C... 

In the cases where ^ < V , we encounter a rather peculiar 
situation. From a closed subset we construct a retrenched sub- 
set E^ of ( jtA - l) - symbol sequences. is the set of those 

( - 1) - symbols sequences which coincide with the first (|U - 1) 
symbols of at least one of the members of C^. The retrenched vanish- 
ing subset is defined as the totality of all those ( ^ - 1) - symbol 
sequences which do not belong to any one of the retrenched closed 
subsets. In case of the extended closed subsets, a given sequence 
of ( /A - 1) symbols could not belong to more than one D^_, since the 
division made in Theorem III does not allow for any overlapping. How- 
ever, in the present case of retrenched subsets, a given ( |A — 1 ) 

-symbol sequence may well belong to more than one E. If the ( yU. -l) 

- symbol final sequence of (10) belongs to the retrenched vanishing 
subset, U ^ will always vanish. If the ( ^A - l) - symbol final se- 
quence belongs to Ej_, Ej, . E^ , then U will be zero for an 
initial sequence belonging to a C different from any on© of the corresp- 
onding subsets? C^, Cj, . ., C^. For the same final sequence, U ^ 
may thus have different non-vanishing values according as to which 
one of C^, Cj, . . , C^ the initial sequence belongs. 
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Theorem VIII . When , ergodicity holds for the Initial 

family identical with one of the closed subset and the 

final family identical with the corresponding retrenched 
subset E^. 

In the foregoing considerations , we have systematically omitted 
the initial sequences belonging to the vanishing subset V. The rea- 
son for this is that the U lo< ^ depends in this case on the detailed 
structure of the intersymbol correlation* and that we cannot draw 
a conclusion of general validity. (Of course, if the final sequence 
also belongs to V, then U ^ vanishes). 

Regarding the closed subsets of ( ^ =1) symbols, we should like 
to mention the following interesting property. We have obviously 



U ( > ( 0 lwv < *l'-l| b 2 by) - 2 U’ fe ° ) C a o • \ b 1y "/l Vl)Q(bj,..j by) (16) 



whence we infer; 

Theorem IX . (b^,b 3 , . .by ) is a member of C,. , if there 
is any symbol b i such that (b ^ , b a , . . by_| ) is a member 
of and Q(b, , b a , . . , b | b v ) ^ 0. 

For a given (b 1 ? b 2 , . . , by_, ) there must be at least one by 
such that Q(b 1 , b z , . ., b^., (by ) 7 ^ 0, on account of (2). Henees 
Theorem X . If (b 1 , b 2 , . by_, ) is a menfcer of C^, the n 
there is always a member of C. whose first (V - 2) symbols are 

(b_£ , o o, by _|) 0 

Before closing this section, a simple illustration may be given. 
Suppose the alphabet to be composed of three symbols; S]_, and S^, 
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and to have an intersymbol correlation of range 3« 



Q(S i , 


Si 1 


Si 


) - 1, 


Q(S , , 


Sa | 


s, ) = 1 , 


Q(S x , 


s,| 


S* 


)= 1, 


Q(S l , 


S. 1 


S a ) ~ lj 


Q(Sa > 


Si 


1 S 1 


)= 1, 


Q(S a > 


S 3 1 


, s, ) « 1 , 


Q(s 3 , 


S, 1 


! S, 


) = 1, 


q(s 3 , 


S 2 1 


S, ) = 1, 


Q(Sj , 


S 3 1 


|S, 


) = 1. 









Then the (V - l) symbol subsets are; 

Ci s (S i > Sj ) 

C* : (s, , S a ), (S a , S a ) 

C 3 : (S^S.) 

V : (S, , S 3 ), (S 3 , S t ), (S 2 , S 3 ), (S 3 , S a ),(S 3 , S 3 ) 



The extended 3-symbol subsets ares 



D s (S 1y S t/ S t ) 

D : (S 1( S , Sj ), (S i S ■, S a ) 

D : (Sa, S a , S z ) 

V' ; all other 3-symbol sequences 



The retrenched 1-symbol subsets are: 



E : Si 

E : S^S 4 

E : S 2 

V : S 3 

We can see the overlapping we have discussed; as a result, U ^ with the 



final sequence (symbol) S t t for instance, becomes/ three-valued t 



(<*) 



U 

U (oo; 

u Cca) 
u 



t«*) 



( s i, 

(Si, 

S 

(Sa, 



Si 

S a 
S 1 

s 2 



Si 
Si ) 
S, 
Si 



All other | ) 



)= 1 
_ i 

I ! 

= 2 

= 0 
= 1 
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#3. Redundancy 



In this section* we shall constantly use a quantity denoted by; 

P •• , a*) > 1 . (17) 

Definition III, The quantity (17) represents the probability* in 
infinitely long message s* of an arbitrarily taken sequence of 
symbol-length n being a particular sequence (a u a* , ..,a n ) 0 
From this definition follows the normalization condition; 

£ , 'P/P(d 1l Qj > .. > Q.ri.) 1 • ( 13 ) 

According to the point of view of the last section* the existence 
of a unique value of such a probability is not unconditionally guaran- 
teed. Only if the initial sequence (bi * . .* b,,., ) is limited to 
within a closed subset* say, Cj_* then 

u<o0) Cb 1y .. * bv_, | 

becomes independent of (b t *. .* b v _ } ), i.e.* a function only of 
(aj * . a^), If this is the case* we can write 

(ki, •• , b* — | = P(Q.,, (19) 

According to the theorems of the last section* if (a t , a A ) 
belongs to C^, or its extended subset D^_* or its retrenched subset E^, 

P will be finite* and otherwise zero. We have therefore to restrict the 
"infinitely long messages" of Definition III to only those which start 
with initial sequences belonging to one closed subset, ^he condition 
regarding P does not require that all the P f s should be non-vanish- 
ing* thence the restriction on the final sequences, in the sense of 
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Definition II , is not necessary. On account of ergodicity, two sequences 
starting from two different initial sequences of the same closed subset 
becomes , in the long run, statistically identical. It is true that we 
can evade the restriction on the initial sequences by giving a certain 
"weight" to each of the closed subsets, which would lead to a unique 
value of each P. However, from the point of view that the messages are 
engendered solely by the correlation probability, this alternative is 
not acceptable, since it involves an arbitrary "weight" of each closed 
subset. Our discussion of this section will be based on the assumption 
that the initial sequences are limited to a single subset. The generali- 
zation of the results to the case of "weighted" subsets is very simple. 

It should be noted that, as a result of the limitation of the ini- 
tial sequences to a single subset, it may well happen that some of the 
generally possible sequences (a 2 , . ., a v _| ) in the correlation pro- 
bability Q(a, , . ., a y _, | a v ) actually never happen in the possible 
messages, ^hus the actual range of correlation may become smaller than 
the range defined with regard to the entire possibilities of the a’s. 

For instance, in the illustration of the last section, if we limit our- 
selves to the initial subset C 2 , all 3-symbol Q*s except QCS^ S a | S, )=1 
and Q(S Vj S., J S 2 ) = 1 will become meaningless. T hese two 3-symbol cor- 
relation probabilities reduce to the following two 2-symbol correlation 
probabilities; Q(S, | S a ) = 1, and Q(S z | S, )— 1„ The range is thus 
reduced from three to two. 

In the empirical point of view, if a population of very long sample 
messages is given, we can always evaluate (17) by just counting the 
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frequency of eaoh segment j,. . , a a ). However, if we divide this entire 
population into, say, two groups, the values of (17) maybe different in 
the two groups. This discrepancy may be caused by a difference in cor- 
relation probabilities and/or by a difference in the initial sequences. 

We thus see that the problem of ergodicity is not irrelevant to the env- 
pirical point of view. In this section, however, we assume that we have 
a single population from which the quantities of the type ( 17 ) are 
uniquely determined. 

The quantity (17) has, besides (18), the property: 

2 1 P (Q.I, * • /G'-n) 

= P ( fc>i, b m ) , (20) 

his is obvious from the statistical point of view, but can also be 
verified from the standpoint of (19). 

According to (6), we have for n > v> 



P C&iy • • > &n) = P(Qi, • • t Q-k-i ) Q(G|, '•> 1 Q-v) Q (Gn-; 



v + 1 



Q,-.| a*) , (21) 



or more generally. 



P (d 0 . cu) = P(4i, . . Q ( a v • / a ^-i * Q (a n .^+, , - . ,4*-, | cu) f ^ 2 ) 

provided H > fA 2. V . Equivalence of (21) and (22) can readily be seen 
with the help of (3) and (6). In particular, for n = 'Z. V , we get 
from (22) 



0 (0.1/ , Q.yU-1 I ) 



P(Gi> •• < ft^) 

I CG|; • • j G^u-I ) 



(23) 



This is just what should be according to Definitions I and III. 
(23) may be considered as the definition of Q(a t , . ., a^_, ( a^. ) even 
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for jj.<V . However, with such Q's with /J.<V , 1.22) will not be true, 
since the Q's with cannot describe fully the existing correlation,, 

Substituting (23) into (22), we get 



P ( , • • , &!\) — 



•• , G-fl') P(Q2," > PyM + i) p (Q.n-^,+1 , * ^n) 

P Cbl ‘ J ' P to-n-yU+l , ' ' > ) 

X, 



(24) 



provided Y[>^( >V « he actual range V is thus the minimum value of 
for which the decomposition (24) is allowed. 

For an allowed value of if a further decomposition of range 
1 is still allowed, i.e., if 1 > i> , then we get from (24) 

(25) 



E (a, , .. ,a r ) = 

P ( Q i , • * / ) 



for all (a x , . ., a^ ). -^ut if yu-1 <V, the left side of (25) will 

not be equal to its right side for at least one sequence (aj , . ., a^ ). 

■^hus we are led to use (25) as a criterion to determine whether yu > y 
or not: If (25) holds for all (a, , . ., a ^ ), then yU > P j if not, 

u < V • Indeed, if (25) is possible, we have in virtue of (23), 



Q (Qi, 



• 1 6.^0 ~ 



P (Qi , • • ) ) 

PCQi,— , CLjn- 1) 



PCfta," /<V) 



Q (Q z> . . , 1 1 G.^ ) 



(26) 



i.e„, Q of range yu. is reducible to a Q of range (yu. - l). In the 
light of Theorem I, this means that the actual range is ( - l) or 
less. If (25) breaks down for at least one sequence (a z , . ., a ^ ), 
then (26) does not hold in general, meaning that the actual range is 
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larger than ( ^ - l) . 

Theorem XI , If and only if (25) holds for all (a^, , . , a K ), the 

actual correlation range V is ( /A - l) or less . 

This criterion is interesting particularly in the empirical point 
of view, for here the P’s, instead of the Q's, are the quantities which 
are primarily given. The criterion of Theorem XI can be brought to a 
more concise form by the help of the well-known theorem attributed to 
W. • Gibbs : 

Theorem XII . If 

0 , and_ ?/t = f) , (2?) 

then 

(28) 

where the equality holds only when f^= g^ for all i. 

Now, let us call the left-hand side and the right-hand side of 
(25), respectively 





* ' ) P C^ty • * ) ) 


(29) 


£ j. C^/ 


P (0, y - * > ^l*-\ ) P (cii y • • , ^/A ) 

• * ; &u) - 

PC , QyU-l ) 


(30) 



and consider the index i of Theorem XII as a collective index for 
various possible sequences of symbol-length . On account of (18) 
and (20), the conditions (27) are satisfied, and we obtain 

-2 1 P(a,,„ , <fy-i ) (a,, ) 

i 2 PCO.1,-- / ^-2 ) ^ / &p~i) 0 . (31) 
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0. In other 



Only when (25) holds for all (a,, a^. ), then W ^ — 

words, for a given value of V , W^ ~ 0 for jA?V . this leads to a 

convenient way to determine the actual ranges 

Theorem XIII . The actual range V is the maximum value of /* for 
which W^ 0. 

The ¥' s defined by ^31) will be called "correlation indicies". 

For 2, the definition of in (31) should be understood 

as meaning 

W 2 = Z PCa,,ft 4 )Jb| PCai,a a )-2 z PCa,)£^Pfei), (32) 

for we have here ^ C G i/®z.) = P^ a '^P(^z) . 

We shall now proceed to find out the average amount of information 
carried by a message-segment of length n in a language in which the 
P's exist. A specific message-segment (a t , a ) has probability 
P(a t , a ^ ). ^hus the information per symbol carried by this mes- 
sage-segment is 

— PC<ity •• ; Q n) . 

The probability of occurrence of such a message being P(a i} .., a ), 
the average information per symbol for various possible message-segments 
of length n is given by 

In ^ — 2i , Qn) P (q, ; •-,&*.) (33) 

Now, if the existing correlation is of range y , the P can be decom- 
posed as in (24) with jx = V . A straightforward calculation with the 
help of (18) and (20) gives 

I*~In,v s Cn-v+i)S P( Q o--) a v) Hotj. PCa, ; civ) 

+ (ri-v) £ PCOij - > P (<? u •• , a v _, ) . (34) 
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For an obvious reason this V can be the actual minimum range or any v 
that is larger than this. Supposing V in (34) to be the actual mini- 
mum range, let us find the error which would be committed by the calcu- 
lation based on the assumption that the actual range were V- 1. ^his 
is easily found to be 



■n,v 






_ (.Tt-V -t-Q 



a 



W v 



(35) 



Repeating this process, we obtain 



1.-1“ =J,.v-I 



■ £ 

yurJL a 




where 



1° s In, i £ ?Ca,)^P(aO 



(36) 



(37) 



Since W ^ vanishes anyway for yu > 1/ , we can state: 

Theorem XIV . The average information per sy mbol carried by a mes- 
sage-segment of length n is 



r 



n 






1 1 



Yl 






(38) 



insofar as n is larger than the actual corre lation range. 

Since the W' s are zero or positive, the intersymbol correlation 
tends to decrease the amount of information. Thus, can be considered 
to represent the "strength" of correlation ■ — strength in the sense of 
reducing the amount of information, a y definition, 1^ cannot be nega- 
tive, thence there is an upper limit to the total "strength" of the cor- 
relation: 







< Z Wm < I' 

uc 2. / 



( 39 ) 
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For n » v 



, we obtain from (38), 



OO 

U « loo = r- (n»v) y (40) 



showing that if we take a sufficiently long segment as a unit, the in- 
formation per symbol becomes independent of the length of the segment. 
This indirectly justifies the usual procedure according to which an in- 
finitely long message: is cut into segments of sufficient length and 
the segments are treated as if they did not have any correlation among 
them. 

The quantity called "redundancy" is defined by^ 



R - 



r-i* 

r • 



(41) 



Theorem XV . The redundancy of a language which is characterized 
by the correlation indices W ^ is given by 

^ > 0<R<1. ( 42) 

In the illustration of the last section, if we limit the initial 
sequences to C z , we get 

= , Wj = = - - - = 0 

P = ^ 2 , : I-o = 0 , R - 100 *[* . 

This last result is not surprising, because the possible infinite sequen- 
&es are 'limited tot* S ^ S ^ ♦ • ., which certainly cannot 

convey any information. 

2. Stanford Goldman, Information Theory ( Prentice-Hall, New York, 1953) 
P. 45. 
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